Abstract:In anomaly segmentation for complex driving scenes, state-of-the-art approaches utilize anomaly scoring functions to calculate anomaly scores. For these functions, accurately predicting the logits of inlier classes for each pixel is crucial for precisely inferring the anomaly score. However, in real-world driving scenarios, the diversity of scenes often results in distorted manifolds of pixel embeddings in embedding space. This effect is not conducive to directly using the pixel embeddings for the logit prediction during inference, a concern overlooked by existing methods. To address this problem, we propose a novel method called Random Walk on Pixel Manifolds (RWPM). RWPM utilizes random walks to reveal the intrinsic relationships among pixels to refine the pixel embeddings. The refined pixel embeddings alleviate the distortion of manifolds, improving the accuracy of anomaly scores. Our extensive experiments show that RWPM consistently improve the performance of the existing anomaly segmentation methods and achieve the best results. Code: \url{https://github.com/ZelongZeng/RWPM}.
Abstract:Recent deep metric learning (DML) methods typically leverage solely class labels to keep positive samples far away from negative ones. However, this type of method normally ignores the crucial knowledge hidden in the data (e.g., intra-class information variation), which is harmful to the generalization of the trained model. To alleviate this problem, in this paper we propose Online Batch Diffusion-based Self-Distillation (OBD-SD) for DML. Specifically, we first propose a simple but effective Progressive Self-Distillation (PSD), which distills the knowledge progressively from the model itself during training. The soft distance targets achieved by PSD can present richer relational information among samples, which is beneficial for the diversity of embedding representations. Then, we extend PSD with an Online Batch Diffusion Process (OBDP), which is to capture the local geometric structure of manifolds in each batch, so that it can reveal the intrinsic relationships among samples in the batch and produce better soft distance targets. Note that our OBDP is able to restore the insufficient manifold relationships obtained by the original PSD and achieve significant performance improvement. Our OBD-SD is a flexible framework that can be integrated into state-of-the-art (SOTA) DML methods. Extensive experiments on various benchmarks, namely CUB200, CARS196, and Stanford Online Products, demonstrate that our OBD-SD consistently improves the performance of the existing DML methods on multiple datasets with negligible additional training time, achieving very competitive results. Code: \url{https://github.com/ZelongZeng/OBD-SD_Pytorch}
Abstract:Most deep metric learning (DML) methods employ a strategy that forces all positive samples to be close in the embedding space while keeping them away from negative ones. However, such a strategy ignores the internal relationships of positive (negative) samples and often leads to overfitting, especially in the presence of hard samples and mislabeled samples. In this work, we propose a simple yet effective regularization, namely Listwise Self-Distillation (LSD), which progressively distills a model's own knowledge to adaptively assign a more appropriate distance target to each sample pair in a batch. LSD encourages smoother embeddings and information mining within positive (negative) samples as a way to mitigate overfitting and thus improve generalization. Our LSD can be directly integrated into general DML frameworks. Extensive experiments show that LSD consistently boosts the performance of various metric learning methods on multiple datasets.
Abstract:The large variation of viewpoint and irrelevant content around the target always hinder accurate image retrieval and its subsequent tasks. In this paper, we investigate an extremely challenging task: given a ground-view image of a landmark, we aim to achieve cross-view geo-localization by searching out its corresponding satellite-view images. Specifically, the challenge comes from the gap between ground-view and satellite-view, which includes not only large viewpoint changes (some parts of the landmark may be invisible from front view to top view) but also highly irrelevant background (the target landmark tend to be hidden in other surrounding buildings), making it difficult to learn a common representation or a suitable mapping. To address this issue, we take advantage of drone-view information as a bridge between ground-view and satellite-view domains. We propose a Peer Learning and Cross Diffusion (PLCD) framework. PLCD consists of three parts: 1) a peer learning across ground-view and drone-view to find visible parts to benefit ground-drone cross-view representation learning; 2) a patch-based network for satellite-drone cross-view representation learning; 3) a cross diffusion between ground-drone space and satellite-drone space. Extensive experiments conducted on the University-Earth and University-Google datasets show that our method outperforms state-of-the-arts significantly.
Abstract:Most person re-identification (ReID) approaches assume that person images are captured under relatively similar illumination conditions. In reality, long-term person retrieval is common and person images are captured under different illumination conditions at different times across a day. In this situation, the performances of existing ReID models often degrade dramatically. This paper addresses the ReID problem with illumination variations and names it as {\em Illumination-Adaptive Person Re-identification (IA-ReID)}. We propose an Illumination-Identity Disentanglement (IID) network to separate different scales of illuminations apart, while preserving individuals' identity information. To demonstrate the illumination issue and to evaluate our network, we construct two large-scale simulated datasets with a wide range of illumination variations. Experimental results on the simulated datasets and real-world images demonstrate the effectiveness of the proposed framework.